POLI 572B

Michael Weaver

March 14, 2022

Inference with Least Squares

Objectives

Recap

  • Ordinary Least Squares assumptions for inference

Causal Inference with Regression

  • Omitted Variable Bias (link to conditioning)
  • Interpolation/Extrapolation Bias
  • Conditioning in Regression

Recap

Ordinary Least Squares Assumptions:

  1. \(Y\) is generated by \(\mathbf{X\beta} + \epsilon\) (model equation is correct)
  2. \(\epsilon_i\) are independent, identically distributed, with variance \(\sigma^2\) for all \(i\)
  3. \(X_i \perp \!\!\! \perp \epsilon_i\): \(X_i\) is independent of \(\epsilon_i\)

If these assumptions are true:

  • OLS \(\widehat{\beta}\) is unbiased estimator for \(\beta\)
  • known sampling distribution of \(\widehat{\beta}\)

A Causal Model for Least Squares

Response Schedule

In the context of experiments, each observation has potential outcomes corresponding to their behavior under different treatments

  • To estimate \(ACE\) of treatment without bias, assumed that treatment status is independent of potential outcomes

Response Schedule

In regression, where levels of treatment might be continuous, we generalize this idea to the “response schedule”:

  • some equation that reflects potential outcomes across different values of the “treatment”
  • approximates the “average causal response function”:
    • how much potential outcomes of \(Y\) change with unit change in \(X\), on average

Response Schedule

\[Y(D_i = d) = \beta_0 + \beta_1 D_i + \epsilon_i\]

Here \(Y_i(D_i = d)\) is the potential outcome for \(i\) for a value of \(D = d\).

  • this response schedule says that \(Y_i\), on average, changes by \(\beta_1\) for a unit changes in \(D\) (may average across non-linear effects of \(D\))
  • We only ever observe one \(Y_i(D_i = d)\) for the \(d\) that occurs, other values are counterfactual.

Response Schedule

If we don’t know parameters \(\beta_0, \beta_1\), what do we need to assume to obtain an estimate \(\widehat{\beta}_1\) that we can give a causal interpretation? (On average, change in \(D\) causes \(\widehat{\beta}_1\) change in \(Y\))

We must assume

  • \(Y_i\) actually produced according to the response schedule (equation is correctly specified; e.g., linear and additive)
  • \(D_i\) is independent of \(\epsilon_i\): \(D_i \perp \!\!\! \perp \epsilon_i\). Sometimes we say \(D\) is exogenous

Response Schedule

Recall OLS assumptions: to have no bias…

  • \(D\) independent of \(\epsilon\)
  • and \(\epsilon\) are other variables that affect \(Y\), per the model. What is one way to be sure we can find effect of \(D\) without bias?
  • \(D\) randomly assigned

But assumptions are violated

If the true process generating the data is:

\[Y_i = \beta_0 + \beta_1 D_i + \beta_2 X_i + \nu_i\]

with \((D_i,X_i) \perp \!\!\! \perp \nu_i\), \(E(\nu_i) = 0\), \(Var(\nu_i) = \sigma^2\)

What happens when we estimate this model with a constant and \(D_i\) but exclude \(X_i\)?

\[Y_i = \beta_0 + \beta_1 D_i + \epsilon_i\]

\[\small\begin{eqnarray} \widehat{\beta_1} &=& \frac{Cov(D_i, Y_i)}{Var(D_i)} \\ &=& \frac{Cov(D_i, \beta_0 + \beta_1 D_i + \beta_2 X_i + \epsilon_i)}{Var(D_i)} \\ &=& \frac{Cov(D_i, \beta_1 D_i)}{Var(D_i)} + \frac{Cov(D_i,\beta_2 X_i)}{Var(D_i)} + \frac{Cov(D_i,\epsilon_i)}{Var(D_i)} \\ &=& \beta_1\frac{Var(D_i)}{Var(D_i)} + \beta_2\frac{Cov(D_i, X_i)}{Var(D_i)} \\ &=& \beta_1 + \beta_2\frac{Cov(D_i, X_i)}{Var(D_i)} \end{eqnarray}\]

So, \(E(\widehat{\beta_1}) \neq \beta_1\), it is biased

Omitted Variable Bias

When we exclude \(X_i\) from the regression, we get:

\[\widehat{\beta_1} = \beta_1 + \beta_2\frac{Cov(D_i, X_i)}{Var(D_i)}\]

This is omitted variable bias

  • recall: \(\beta_1\) is the effect of \(D\) on \(Y\)
  • recall: \(\beta_2\) is the effect of \(X\) on \(Y\)

Excluding \(X\) from the model: \(\widehat{\beta_1} = \beta_1 + \beta_2\frac{Cov(D_i, X_i)}{Var(D_i)}\)

What is the direction of the bias when:

  1. \(\beta_2 > 0\); \(\frac{Cov(D_i, X_i)}{Var(D_i)} < 0\)

  2. \(\beta_2 < 0\); \(\frac{Cov(D_i, X_i)}{Var(D_i)} < 0\)

  3. \(\beta_2 > 0\); \(\frac{Cov(D_i, X_i)}{Var(D_i)} > 0\)

  4. \(\beta_2 = 0\); \(\frac{Cov(D_i, X_i)}{Var(D_i)} > 0\)

  5. \(\beta_2 > 0\); \(\frac{Cov(D_i, X_i)}{Var(D_i)} = 0\)

Omitted Variable Bias

This only yields bias if two conditions are true:

  1. \(\beta_2 \neq 0\): omitted variable \(X\) has an effect on \(Y\)

  2. \(\frac{Cov(D_i, X_i)}{Var(D_i)} \neq 0\): omitted variable \(X\) is correlated with \(D\).(on the same backdoor path)

This is why we don’t need to include EVERYTHING that might affect \(Y\) in our regression equation; only those variables that affect treatment and the outcome.

Conditioning:

conditioning is when we examine the effect of \(D_i\) on \(Y_i\) within subsets/strata of the data defined by the values of \(\mathbf{X_i}\), blocking backdoor (non-causal) paths from \(D\) to \(Y\).

  • Where \(\mathbf{X_i}\) is a set of variables on backdoor paths from \(D\) to \(Y\).
  • We examine relationship between \(D\) and \(Y\), within groups of cases where values of \(\mathbf{X_i}\) are the same (hold value of backdoor paths constant)

Conditioning Assumptions

\(1\). Ignorability/Conditional Independence: within strata of \(X\), potential outcomes of \(Y\) must be independent of \(D\) - all ‘backdoor’ paths are blocked; no colliders

\(2\). Positivity/Common Support: For all values of treatment \(d\) in \(D\) and all value of \(x\) in \(X\): \(Pr(D = d | X = x) > 0\) and \(Pr(D = d | X = x) < 1\) - There must be variation in treatment within every strata of \(X\)

Omitted Variable Bias

Link to DAGs:

  • OVB solved when we “block” backdoor paths from \(D\) to \(Y\).

Link to conditioning:

  • OVB is a result of conditional independence assumption being wrong

Link to OLS assumptions:

  • unblocked backdoor paths \(\to\) \(D\) not independent of \(\epsilon\)

Exercise:

  1. Brainstorm a list of variables that may be causally linked to anti-Rohingya Muslim attitudes in Myanmar (and the direction of the effect)
  2. Add these variables (and any backdoor paths) to a DAG that initially includes just this link \(\mathrm{Higher \ Income} \xrightarrow{Decreases} \mathrm{AntiMuslim Attitudes}\)
  3. Imagine we want to estimate the causal effect of income on Anti-Muslim prejudice:
    • which of the variables in your DAG would produce OVB if excluded from the regression? would it induce upward or downward bias the estimated effect of income?
    • which of the variables in your DAG WOULD NOT produce OVB?

Other Biases

(1) Omitted Variable Bias

  • Excluding some variable \(X\) that should be in the model
  • Induces bias if \(X\) is causally related to both \(D\) and \(Y\) (\(X\) is a confounder)
  • We need to make a big assumption that we’ve included all relevant confounding variables: conditional independence assumption or ignorability

Potential Pitfalls

Even if we included all variables on backdoor path between \(D\) and \(Y\), we could have problems getting an unbiased estimate of the causal effect using regression:

  • we typically assume the model is linear and additive.
  • but the world might be non-linear and interactive
  • our decisions about how to specify the math of regression equation can lead to bias

(2) Interpolation Bias:

Typically: we approximate the relationship between variables in \(X\) and \(D\) to be additive and linear. If linear approximation is wrong, we have bias.

  • By forcing relationship between \(X\) and \(D\) to be linear and additive, conditioning on \(X\) may not remove non-linear association between \(X\) and \(D\), \(X\) and \(Y\). \(\to\) bias.
  • This unmodeled relationship will become part of \(\epsilon_i\) (because \(X\) affects \(Y\)), and will not be independent of \(D_i\) (because there is a non-linear association between \(X\) and \(D\)).

Example: Interpolation

Let’s return to the same example we used above: do hours increase earnings?

Let’s say the model we want to estimate is:

\(Y_i = \beta_0 + \beta_1 Hours_i + \beta_2 Female_i +\) \(\beta_3 Age_i + \beta_4 Law_i + \epsilon_i\)

And we want to find \(\beta_1\) with \(\widehat{\beta_1}\)

  • how are we assuming linearity?
  • how are we assuming additivity?

Example: Interpolation

Questions to ask:

  • Is relationship between age and hours, age and income earned linear?
  • Are the effects of sex and age additive? (Relationship between age and hours worked the same for men and women?)

Example: Interpolation

Assuming additivity and linearity:

## 
## Call:
## lm(formula = INCEARN ~ UHRSWORK + SEX + AGE + LAW, data = acs_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -314852  -89604  -28382   71587 1065811 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3782.10    2295.92   1.647   0.0995 .  
## UHRSWORK      1301.77      27.66  47.062   <2e-16 ***
## SEX         -40411.28     683.32 -59.140   <2e-16 ***
## AGE           3377.68      30.47 110.853   <2e-16 ***
## LAW         -48942.34     647.40 -75.598   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 126900 on 166839 degrees of freedom
## Multiple R-squared:  0.1506, Adjusted R-squared:  0.1506 
## F-statistic:  7397 on 4 and 166839 DF,  p-value: < 2.2e-16

Assuming linearity, not additivity

## 
## Call:
## lm(formula = INCEARN ~ UHRSWORK + SEX * AGE + LAW, data = acs_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -316004  -89725  -28458   71349 1064913 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    358.37    2477.51   0.145 0.884988    
## UHRSWORK      1300.22      27.66  47.002  < 2e-16 ***
## SEX         -29995.51    2914.29 -10.293  < 2e-16 ***
## AGE           3452.76      36.68  94.135  < 2e-16 ***
## LAW         -48949.01     647.38 -75.611  < 2e-16 ***
## SEX:AGE       -238.80      64.95  -3.677 0.000237 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 126900 on 166838 degrees of freedom
## Multiple R-squared:  0.1507, Adjusted R-squared:  0.1507 
## F-statistic:  5920 on 5 and 166838 DF,  p-value: < 2.2e-16

Assuming neither linearity or additivity

## 
## Call:
## lm(formula = INCEARN ~ UHRSWORK + as.factor(SEX) * as.factor(AGE) + 
##     LAW, data = acs_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -317309  -87614  -26776   70186 1085490 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       14471.10   11240.80   1.287 0.197966    
## UHRSWORK                           1453.56      27.04  53.758  < 2e-16 ***
## as.factor(SEX)1                    -222.84   14509.56  -0.015 0.987746    
## as.factor(AGE)26                   -227.22   12367.78  -0.018 0.985342    
## as.factor(AGE)27                  -4093.90   11714.05  -0.349 0.726725    
## as.factor(AGE)28                    857.74   11550.31   0.074 0.940803    
## as.factor(AGE)29                   6050.51   11471.60   0.527 0.597893    
## as.factor(AGE)30                  17439.54   11428.81   1.526 0.127030    
## as.factor(AGE)31                  29153.11   11417.68   2.553 0.010671 *  
## as.factor(AGE)32                  47053.94   11396.85   4.129 3.65e-05 ***
## as.factor(AGE)33                  65831.26   11393.78   5.778 7.58e-09 ***
## as.factor(AGE)34                  85911.22   11386.22   7.545 4.54e-14 ***
## as.factor(AGE)35                 101013.05   11376.88   8.879  < 2e-16 ***
## as.factor(AGE)36                 112628.62   11378.39   9.898  < 2e-16 ***
## as.factor(AGE)37                 125906.23   11367.24  11.076  < 2e-16 ***
## as.factor(AGE)38                 138508.50   11360.78  12.192  < 2e-16 ***
## as.factor(AGE)39                 148459.13   11369.87  13.057  < 2e-16 ***
## as.factor(AGE)40                 150215.80   11354.72  13.229  < 2e-16 ***
## as.factor(AGE)41                 155499.72   11359.63  13.689  < 2e-16 ***
## as.factor(AGE)42                 155370.17   11361.77  13.675  < 2e-16 ***
## as.factor(AGE)43                 161180.51   11347.70  14.204  < 2e-16 ***
## as.factor(AGE)44                 161984.26   11349.40  14.272  < 2e-16 ***
## as.factor(AGE)45                 171542.30   11348.39  15.116  < 2e-16 ***
## as.factor(AGE)46                 170424.45   11354.32  15.010  < 2e-16 ***
## as.factor(AGE)47                 172392.33   11342.30  15.199  < 2e-16 ***
## as.factor(AGE)48                 165000.71   11339.02  14.552  < 2e-16 ***
## as.factor(AGE)49                 168936.32   11341.20  14.896  < 2e-16 ***
## as.factor(AGE)50                 168572.39   11328.47  14.880  < 2e-16 ***
## as.factor(AGE)51                 167874.54   11331.84  14.814  < 2e-16 ***
## as.factor(AGE)52                 170229.40   11320.93  15.037  < 2e-16 ***
## as.factor(AGE)53                 165777.15   11327.22  14.635  < 2e-16 ***
## as.factor(AGE)54                 165600.43   11321.66  14.627  < 2e-16 ***
## as.factor(AGE)55                 164665.50   11320.25  14.546  < 2e-16 ***
## as.factor(AGE)56                 161668.83   11321.77  14.279  < 2e-16 ***
## as.factor(AGE)57                 164531.97   11327.19  14.525  < 2e-16 ***
## as.factor(AGE)58                 162783.79   11327.23  14.371  < 2e-16 ***
## as.factor(AGE)59                 158289.60   11328.84  13.972  < 2e-16 ***
## as.factor(AGE)60                 156791.72   11334.20  13.834  < 2e-16 ***
## as.factor(AGE)61                 154631.44   11354.34  13.619  < 2e-16 ***
## as.factor(AGE)62                 155696.84   11367.82  13.696  < 2e-16 ***
## as.factor(AGE)63                 158144.67   11390.81  13.884  < 2e-16 ***
## as.factor(AGE)64                 151768.32   11408.70  13.303  < 2e-16 ***
## LAW                              -46562.08     630.79 -73.815  < 2e-16 ***
## as.factor(SEX)1:as.factor(AGE)26   3059.47   16280.38   0.188 0.850936    
## as.factor(SEX)1:as.factor(AGE)27    985.39   15382.64   0.064 0.948924    
## as.factor(SEX)1:as.factor(AGE)28  -2081.37   15130.61  -0.138 0.890588    
## as.factor(SEX)1:as.factor(AGE)29  -3610.69   15040.09  -0.240 0.810276    
## as.factor(SEX)1:as.factor(AGE)30  -2252.62   14989.00  -0.150 0.880540    
## as.factor(SEX)1:as.factor(AGE)31  -5006.74   14987.30  -0.334 0.738330    
## as.factor(SEX)1:as.factor(AGE)32  -7235.96   14961.95  -0.484 0.628653    
## as.factor(SEX)1:as.factor(AGE)33 -14160.90   14968.86  -0.946 0.344138    
## as.factor(SEX)1:as.factor(AGE)34 -20578.40   14972.40  -1.374 0.169312    
## as.factor(SEX)1:as.factor(AGE)35 -28186.14   14963.43  -1.884 0.059612 .  
## as.factor(SEX)1:as.factor(AGE)36 -32074.22   14986.75  -2.140 0.032342 *  
## as.factor(SEX)1:as.factor(AGE)37 -37691.34   14977.71  -2.516 0.011854 *  
## as.factor(SEX)1:as.factor(AGE)38 -46296.67   14986.83  -3.089 0.002008 ** 
## as.factor(SEX)1:as.factor(AGE)39 -46408.58   15003.91  -3.093 0.001981 ** 
## as.factor(SEX)1:as.factor(AGE)40 -48159.27   14997.09  -3.211 0.001322 ** 
## as.factor(SEX)1:as.factor(AGE)41 -45257.21   15013.43  -3.014 0.002575 ** 
## as.factor(SEX)1:as.factor(AGE)42 -43580.81   15023.96  -2.901 0.003723 ** 
## as.factor(SEX)1:as.factor(AGE)43 -46269.87   15012.33  -3.082 0.002056 ** 
## as.factor(SEX)1:as.factor(AGE)44 -46987.35   15014.14  -3.130 0.001751 ** 
## as.factor(SEX)1:as.factor(AGE)45 -60883.51   15012.35  -4.056 5.00e-05 ***
## as.factor(SEX)1:as.factor(AGE)46 -58627.82   15035.48  -3.899 9.65e-05 ***
## as.factor(SEX)1:as.factor(AGE)47 -59266.58   15042.14  -3.940 8.15e-05 ***
## as.factor(SEX)1:as.factor(AGE)48 -49249.48   15036.11  -3.275 0.001055 ** 
## as.factor(SEX)1:as.factor(AGE)49 -48014.24   15044.98  -3.191 0.001416 ** 
## as.factor(SEX)1:as.factor(AGE)50 -55003.47   15034.24  -3.659 0.000254 ***
## as.factor(SEX)1:as.factor(AGE)51 -56290.13   15039.47  -3.743 0.000182 ***
## as.factor(SEX)1:as.factor(AGE)52 -59102.48   15047.20  -3.928 8.58e-05 ***
## as.factor(SEX)1:as.factor(AGE)53 -52689.16   15048.52  -3.501 0.000463 ***
## as.factor(SEX)1:as.factor(AGE)54 -55800.48   15062.39  -3.705 0.000212 ***
## as.factor(SEX)1:as.factor(AGE)55 -50918.02   15096.67  -3.373 0.000744 ***
## as.factor(SEX)1:as.factor(AGE)56 -50530.06   15125.25  -3.341 0.000836 ***
## as.factor(SEX)1:as.factor(AGE)57 -53462.32   15142.56  -3.531 0.000415 ***
## as.factor(SEX)1:as.factor(AGE)58 -50651.21   15179.63  -3.337 0.000848 ***
## as.factor(SEX)1:as.factor(AGE)59 -46199.91   15263.25  -3.027 0.002471 ** 
## as.factor(SEX)1:as.factor(AGE)60 -40946.54   15317.89  -2.673 0.007516 ** 
## as.factor(SEX)1:as.factor(AGE)61 -41701.21   15414.78  -2.705 0.006826 ** 
## as.factor(SEX)1:as.factor(AGE)62 -49152.09   15505.31  -3.170 0.001525 ** 
## as.factor(SEX)1:as.factor(AGE)63 -51645.68   15757.81  -3.277 0.001048 ** 
## as.factor(SEX)1:as.factor(AGE)64 -47218.46   15938.36  -2.963 0.003051 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 123500 on 166762 degrees of freedom
## Multiple R-squared:  0.1963, Adjusted R-squared:  0.1959 
## F-statistic: 502.9 on 81 and 166762 DF,  p-value: < 2.2e-16

What is wrong with interpolation bias?

  • In conditioning, we want to compare \(Y_i(D_i = 1)|X_i\) to \(Y_i(D_i = 0)|X_i\) for cases where values of confounding variables \(X_i = x\) are the same
  • In regression, we compare \(Y_i(D_i = 1)|X_i\) against linear prediction of \(\widehat{Y}(D_i = 0)|X_i\).
  • This approximation may fail, sometimes spectacularly…

transparent dots indicate \(\widehat{Y}\) interpolated by regression.

## Loading required package: modelsummary
## 
## Attaching package: 'modelsummary'
## The following object is masked from 'package:Hmisc':
## 
##     Mean
## Warning: In version 0.8.0 of the `modelsummary` package, the default significance markers produced by the `stars=TRUE` argument were changed to be consistent with R's defaults.
## This warning is displayed once per session.
Model 1 Model 2
(Intercept) 11.664*** 2.035***
(0.928) (0.311)
d −6.731*** 0.929**
(1.092) (0.307)
x 0.338 −0.559***
(0.245) (0.060)
x2 1.025***
(0.024)
Num.Obs. 100 100
R2 0.281 0.963
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Actual (black) vs Regression (red) weights

Interpolation Bias:

When we approximate of the relationship of confounding variables \(X\) with \(D\) and \(Y\) through linearity (or any imposed functional form) and additivity, we may generate interpolation bias even if there is no omitted variable bias.

Interpolation bias is relatively easy to diagnose:

  • Scatterplot \(D\) with \(X\)
  • Choose more flexible model (non-linear/interactive)
  • Linear/additive apprx. may be acceptible because least squares is a linear apprx. of non-linear functions within the region where we have data.

(3) Extrapolation Bias

Typically: we approximate the relationship between variables in \(X\) and \(D\) to be additive and linear.

If \(D = 1\) never occurs for certain values of \(X\) (e.g. \(X > 0\)), regression model will use additivity and linearity (or ANY functional form) to extrapolate a predicted value of what \(D = 1\) would look like when \(X > 0\).

  • King and Zeng show that extrapolations are very sensitive to model choice: small changes in assumptions create large changes in estimates.

Example: Extrapolation

Does being a democracy cause a country to provide better public goods?

We decide to estimate the following regression model. Let’s assume that there is no omitted variable bias.

\(Public \ Goods_i = \beta_0 + \beta_1 Democracy_i +\) \(\beta_2 ln(Per \ Capita \ GDP_i) + \epsilon_i\)

Download simulated data from here:

https://pastebin.com/zW4wiqxp

\(1\). Estimate \(Public \ Goods_i = \beta_0 + \beta_1 Democracy_i + \beta_2 ln(Per \ Capita \ GDP_i) + \epsilon_i\) using the lm function. What is the effect of democracy on public goods?

\(2\). What does this model assume about the relationship between per capita GDP and public goods for democracies? For non-democracies?

\(3\). Plot public goods on GDP per capita, grouping by democracy

ggplot(data_9, aes(x = l_GDP_per_capita, y = public_goods, colour = Democracy %>% as.factor)) + geom_point()

\(4\). What is the true functional form of the relationship between GDP per capita and public goods for Democracies? For non-democracies where GDP per capita is \(> 0\)?

\(5\). Create a new variable for per capita GDP \(> 0\).

\(6\). Repeat the regression from (1) for cases where l_GDP_per_capita $ < 0$. Why is the estimate of \(\widehat{\beta_1}\) different?

In this example:

We observe no non-democracies with GDP per capita as high as some democracies, but must condition on GDP per capita.

Options:

  • Using regression (\(1\) on previous slide): we linearly and additively extrapolated from the model what public goods for non-democracies with high per capita GDP would be (where we have no data, only the model)

or

  • Using regression (\(6\) on previous slide): we restricted our analysis to the region of the data with common support (range of GDP per capita values that contain both democracies and non-democracies). Linear interpolation, but the model is not extrapolating to regions without data.

Conditioning and Regression:

Problems of extrapolation relate to assumption of Positivity or Common Support in conditioning:

  • For conditioning: we assume that we have variation in \(D\) within every strata of \(X\).
  • In regression, absence of common support does not break the analysis. We just make model-based extrapolations…
    • in contrast to matching: no common support means no conditioning

Conditioning with Regression

Omitted Variable Bias

Solutions?

  • No easy fixes
  • DAGs, justification
  • in the future, will discuss sensitivity tests

Inter/Extrapolation Bias

Solutions?

  • Restrict analyses to cases where we have overlap in conditioning variables (extrapolation)
  • Fewer functional form assumptions (interpolation/extrapolation)

Saturated Regression

One solution to both extrapolation and interpolation bias is saturated regression

  • a dummy variable for every unique combination of values for conditioning variables \(X\).
  • we now compare difference in averages of treated/untreated within each strata of \(X\)
  • returns an average causal effect, but weighted by \(N\) and variance of \(D\) within each strata of \(X\).

Saturated Regression

Pros:

  • no possibility of interpolation: model is linear and additive in \(X\), but “assumption-free” because all unique values of \(X\) have their own means.
  • zero weight on strata of \(X\) with all “treatment”/all “control” (no extrapolation)
    • why is this the case?

naive linear
(Intercept) 0.849*** (0.004) 0.892*** (0.008)
svy_sh_income_rc −0.043*** (0.002) −0.030*** (0.002)
svy_sh_female_rc 0.040*** (0.003)
svy_sh_age_rc_n 0.006*** (0.001)
svy_sh_education_rc_n −0.015*** (0.001)
svy_sh_ethnicity_rcChin 0.034*** (0.007)
svy_sh_ethnicity_rcMon 0.022** (0.008)
svy_sh_ethnicity_rcKachin 0.009 (0.013)
svy_sh_ethnicity_rcKayah 0.042*** (0.008)
svy_sh_ethnicity_rcKayin 0.018*** (0.005)
svy_sh_ethnicity_rcRakhine 0.067*** (0.012)
svy_sh_ethnicity_rcShan 0.007+ (0.004)
svy_sh_ethnicity_rcMixed ancestry 0.001 (0.015)
svy_sh_ethnicity_rcNon-TYT −0.089*** (0.007)
svy_sh_religion_rcChristian −0.090*** (0.005)
svy_sh_religion_rcMuslim −0.570*** (0.007)
svy_sh_religion_rcHindu −0.103*** (0.013)
svy_sh_religion_rcAnimist 0.026 (0.025)
svy_sh_religion_rcAthiest −0.585*** (0.051)
svy_sh_religion_rcOther −0.577*** (0.100)
svy_sh_profession_type_rc2 0.004 (0.005)
svy_sh_profession_type_rc3 0.006 (0.006)
svy_sh_profession_type_rc4 −0.015** (0.005)
svy_sh_profession_type_rc5 0.010* (0.005)
svy_sh_profession_type_rc6 0.020* (0.009)
svy_sh_income_source_rcDay Labour −0.001 (0.006)
svy_sh_income_source_rcRetired −0.030* (0.014)
svy_sh_income_source_rcService Provider −0.011+ (0.006)
svy_sh_income_source_rcShop Owner −0.009 (0.006)
svy_sh_income_source_rcStaff −0.001 (0.006)
svy_sh_income_source_rcTrader −0.030*** (0.008)
Num.Obs. 28350 28179
R2 0.017 0.344

Saturated Regression

x
0:18-27:Graduate:Shan:Buddhist:1:Agriculture
1:18-27:High:Mixed ancestry:Christian:1:Staff
1:28-37:Graduate:Kayah:Christian:2:Staff
1:18-27:High:Kayah:Christian:1:Agriculture
1:18-27:Graduate:Shan:Buddhist:1:Day Labour
1:28-37:High:Kayah:Buddhist:4:Staff
1:18-27:High:Kayah:Christian:2:Day Labour
1:18-27:High:Kayah:Christian:1:Day Labour
0:18-27:High:Kayah:Christian:3:Day Labour
1:18-27:High:Shan:Christian:2:Staff

Saturated Regression

Effect of income on prejudice: saturated regression

## Loading required package: fixest
## NOTE: 1,600 observations removed because of NA values (LHS: 1,580, RHS: 27).

naive linear saturated
(Intercept) 0.849*** (0.004) 0.892*** (0.008)
svy_sh_income_rc −0.043*** (0.002) −0.030*** (0.002) −0.025*** (0.003)
svy_sh_female_rc 0.040*** (0.003)
svy_sh_age_rc_n 0.006*** (0.001)
svy_sh_education_rc_n −0.015*** (0.001)
svy_sh_ethnicity_rcChin 0.034*** (0.007)
svy_sh_ethnicity_rcMon 0.022** (0.008)
svy_sh_ethnicity_rcKachin 0.009 (0.013)
svy_sh_ethnicity_rcKayah 0.042*** (0.008)
svy_sh_ethnicity_rcKayin 0.018*** (0.005)
svy_sh_ethnicity_rcRakhine 0.067*** (0.012)
svy_sh_ethnicity_rcShan 0.007+ (0.004)
svy_sh_ethnicity_rcMixed ancestry 0.001 (0.015)
svy_sh_ethnicity_rcNon-TYT −0.089*** (0.007)
svy_sh_religion_rcChristian −0.090*** (0.005)
svy_sh_religion_rcMuslim −0.570*** (0.007)
svy_sh_religion_rcHindu −0.103*** (0.013)
svy_sh_religion_rcAnimist 0.026 (0.025)
svy_sh_religion_rcAthiest −0.585*** (0.051)
svy_sh_religion_rcOther −0.577*** (0.100)
svy_sh_profession_type_rc2 0.004 (0.005)
svy_sh_profession_type_rc3 0.006 (0.006)
svy_sh_profession_type_rc4 −0.015** (0.005)
svy_sh_profession_type_rc5 0.010* (0.005)
svy_sh_profession_type_rc6 0.020* (0.009)
svy_sh_income_source_rcDay Labour −0.001 (0.006)
svy_sh_income_source_rcRetired −0.030* (0.014)
svy_sh_income_source_rcService Provider −0.011+ (0.006)
svy_sh_income_source_rcShop Owner −0.009 (0.006)
svy_sh_income_source_rcStaff −0.001 (0.006)
svy_sh_income_source_rcTrader −0.030*** (0.008)
Num.Obs. 28350 28179 28350
R2 0.017 0.344 0.533
R2 Within 0.006
R2 Pseudo
Std.Errors by: saturated

Saturated Regression

We can use regression for conditioning without interpolation bias, extrapolation bias, but

  • we still make conditional independence assumption
  • saturated regression suffers from curse of dimensionality (a lot of \(N\) needed, usually)
  • returns a variance-weighted effect, may not be what we want (this is fixable, though)